Boosting COVID-19 Image Classification Using MobileNetV3 and Aquila Optimizer Algorithm

Currently, the world is still facing a COVID-19 (coronavirus disease 2019) classified as a highly infectious disease due to its rapid spreading. The shortage of X-ray machines may lead to critical situations and delay the diagnosis results, increasing the number of deaths. Therefore, the exploitation of deep learning (DL) and optimization algorithms can be advantageous in early diagnosis and COVID-19 detection. In this paper, we propose a framework for COVID-19 images classification using hybridization of DL and swarm-based algorithms. The MobileNetV3 is used as a backbone feature extraction to learn and extract relevant image representations as a DL model. As a swarm-based algorithm, the Aquila Optimizer (Aqu) is used as a feature selector to reduce the dimensionality of the image representations and improve the classification accuracy using only the most essential selected features. To validate the proposed framework, two datasets with X-ray and CT COVID-19 images are used. The obtained results from the experiments show a good performance of the proposed framework in terms of classification accuracy and dimensionality reduction during the feature extraction and selection phases. The Aqu feature selection algorithm achieves accuracy better than other methods in terms of performance metrics.


Introduction
In December 2019, COVID-19 was declared as a new coronavirus which resulted in an explosive outbreak in China [1]. Due to its highly contagious characteristics, it swept over more than 220 countries with more than 200 million confirmed cases and more than 4.3 million deaths. This pandemic has the second rank among all documented pandemics based on the number of deaths after the 1918 flu pandemic [2]. More than 40% of these confirmed cases and deaths are reported in only three countries: namely the United States, Brazil, and India, as shown in Figure 1. The symptoms of this disease are fever, dry cough, loss of smell and taste, dyspnea, fatigue, and malaise [3]. It may produce acute complications for persons who suffer from other chronic diseases such as hypertension, respiratory system diseases, autoimmune diseases, diabetes, and cardiovascular diseases. Diagnosis of COVID-19 infection using X-ray imaging of the chest has been reported as an accurate diagnosis technique [4]. The conventional human-based detecting technique that depends on the technical experience of a physician or radiologist is inefficient, inaccurate, time consuming, limited, and outdated [5]. The implementation of this technique is subjected to human errors, resulting in the misdiagnosing of the disease. This problem is exacerbated in remote regions where there is a lack of expert physicians. The development of advanced artificial intelligence techniques (AI) allows medical researchers and scientists to develop advanced tools, software, and instruments that can help medical radiologists overcome the problems related to human-based detecting techniques [6]. The last two years have seen a surge in the applications of AI in the diagnosis and forecasting of COVID-19 [7][8][9][10][11][12]. Many approaches have been developed to detect and differentiate between COVID-19 disease and conventional viral pneumonia using chest X-ray and CT images [13]. Zhao et al. [14] investigated the relationship between COVID-19 pneumonia and CT images of the chest. The results revealed typical features observed in the examined images of COVID-19 cases; this finding allows researchers to apply AI in the image processing of chest X-rays and CT of COVID-19 cases. Bernheim et al. [15] reported that the CT images of the infected COVID-19 cases are characterized by the existence of typical hallmarks such as consolidative opacities, ground-glass opacities, and crazy-paving patterns. Pezzano et al. [16] developed a convolutional neural network (CNN) to detect ground-glass opacities in the CT images of COVID-19 infected cases. Yasin et al. [17] correlated the disease severity to patients' sex and age based on X-ray images.
The most common used AI reported in the literature to diagnose COVID-19 infections based on CT or X-ray images is CNN models such as VGG-16, VGG-19, Xception, AlexNet, ResNet50V2, CoroNet, LeNet-5, ResNet18, and ResNet 50 [18,19]. The integration between machine learning methods and the so-called metaheuristic (MH) optimization techniques [20,21] has also been reported in the literature as a correct approach with reasonable computational cost. Canayaz [22] developed a hybrid deep neural network-integrated with metaheuristic optimizers to diagnose COVID-19 infections. A dataset contains three groups of X-ray images, namely normal, pneumonia, and COVID-19, and was used to train the model. The images were preprocessed using the contrast-enhancing technique.
Features were extracted using deep learning models, namely GoogleNet, VGG19, AlexNet, and ResNet. The best features were selected using two metaheuristic optimizers, namely grey wolf optimizer and particle swarm optimizer. Then, the features were classified using a support vector machine.
An advanced hybrid classification approach consists of a CNN model and the marine predators optimizer, and the fractional-order algorithm has been developed to detect the infection of COVID-19 based on X-ray images [23]. CNN was used to extract features from the images, while the marine predators optimizer integrated with a fractional-order algorithm was used to select the essential features. The results obtained by the proposed approach were compared with those obtained by other metaheuristic optimizers such as henry gas solubility optimizer, slime mold algorithm optimizer, whale optimization optimizer, particle swarm optimizer, sine cosine algorithm, genetic algorithm, grey wolf optimizer, Harris hawks optimizer, and standalone marine predators optimizer. The proposed approach results revealed its excellent performance compared to the other algorithms in terms of high detection accuracy and low computational cost.
A hybrid metaheuristic optimizing the approach, in which the marine predators optimizer is incorporated with the moth-flame optimizer, was used for image segmentation of COVID-19 cases [24]. The latter optimizer was used as a subroutine in the former optimizer to avoid trapping into local optima. The proposed approach outperformed other advanced optimizers such as the Harris hawks optimizer, grey wolf optimizer, particle swarm optimization, grasshopper algorithm, cuckoo search optimizer, spherical search optimizer, moth-flame optimizer, and standalone marine predators optimizer. An improved cuckoo search optimizer using a fractional calculus algorithm was used to classify X-ray images of COVID-19 cases into normal, COVID-19, and pneumonia patients [25]. Four heavy-tailed distributions, namely Cauchy distribution, Mittag-Leffler distribution, Weibull distribution, and Pareto distribution, were utilized to strengthen the model performance. The proposed model showed its superiority over other feature selection techniques such as the genetic algorithm, Henry gas solubility optimizer, Harris hawks optimizer, salp swarm optimizer, whale algorithm, and grey wolf optimizer.
Another machine learning/metaheuristic optimizing approach was proposed to detect COVID-19 infections based on X-ray images [26]. The important features were extracted from the processed images using a machine learning algorithm called fractional exponent moments. The computational process was accelerated using a multicore computational scheme. Then, a hybrid manta-ray foraging/differential evolution optimizer was used to select the important features. The selection of essential features and excluding irrelevant features help accelerate the classification process, which is accomplished using the k-nearest neighbors technique.
However, most of the presented COVID19 image classification methods have limitations that affect classification accuracy. It has been noticed that these limitations result from either the strategy used to extract the features or the approach used to reduce the number of selected features. Therefore, this motivated us to present an alternative COVID-19 image classification method.
Within this study, we developed an alternative COVID-19 image classification technique that combined the advantages of MobileNetV3 and a new MH technique named Aquila Optimizer (Aqu) [27]. The MobileNetV3 is used to extract the features from the tested images and then, using the binary version of Aquila Optimizer (Aqu) as a feature selection (FS) method, to determine the relevant features. Aqu algorithm has established its performance in several applications, including oil production forecasting [28], and global optimization [29].
The main contributions of this paper are summarized as follows: 1.
Develop a COVID-19 cases detection framework by incorporating MobileNetV3 and Aquila Optimizer as feature extraction and selection algorithms, respectively.

2.
Propose a new feature selection using the binary version of Aquila Optimizer, in addition, using MobileNetV3 to learn and extract the image embedding from the COVID-19 images.

3.
Evaluate the performance of the developed method using two datasets with X-ray and CT images of COVID-19.

4.
Compare the efficiency of the developed approach with other methods.
The structure of the remaining parts of this study is as follows: Section 2 introduces the background of MobileNetV3 and Aquila Optimizer algorithm. Section 3 presents the stages of the developed method. The comparison results are given in Section 4. Finally, we introduce the conclusion and future work of the current study in Section 5.

MobileNetV3
Convolutional neural network architectures have been proposed recently to tackle many different problems and improve their performance in terms of speed and size. Efficient convolutional neural networks implementing the depthwise convolution structure such as NASNet [30], MobileNets [31,32], EfficientNet [33], MnasNet [34], and Shuf-fleNets [35] are considered as a key technique in many computer vision applications [36][37][38][39] known by fast training process. The depthwise convolutional kernel is a learnable parameter applied to each input channel separately from the training images to extract spatial information. Moreover, depthwise convolutional kernels are shared across all input channels, increasing model efficiency and reducing computation cost. However, the depthwise convolutional kernel size can be difficult to learn, thus increasing the complexity of the training process of the depthwise convolutions. In the upcoming paragraphs, we briefly discuss the recently proposed MobileNetV3 [32] architecture.
Previously developed MobileNetV1 and MobileNetV2 were improved with a new version called MobileNetV3, proposed by Howard et al. [32] using network architecture search (NAS). The used NAS technique called NetAdapt algorithm was used to search for the best kernel size and find the optimized MobileNet architecture to fulfill the lowresourced hardware platforms in terms of size, performance, and latency. The MobileNetV3 introduced several building components and blocks inspired by the previous versions, as shown in Figure 1. In addition, MobileNetV3 possesses a new nonlinearity called hard swish (h − swish), which is a modified version of the sigmoid function introduced in [40]. The h-swish nonlinearity is defined as in Equation (1), which is employed to minimize the number of training parameters and reduce the model complexity and size.
where σ(x) represents the piece-wise linear hard analog function. As shown in Figure 1, the MobileNetV3 block contains a core building block called the inverted residual block, which includes a depthwise separable convolution block and a squeeze-and-excitation block [34]. The inverted residual block is inspired from the bottleneck blocks [41], where it uses an inverted residual connection to connect the input and output features on the same channels and improve the features representations with low memory usage. The depthwise separable convolutional contains a depthwise convolutional kernel applied to each channel and a 1 × 1 pointwise convolutional kernel with batch normalization layer (BN) and the ReLU or h − swish activation functions. The depthwise separable convolutional is used to alter the traditional convolution block and reduce the model capacity. The squeeze-and-excitation (SE) block is used to pay more attention to the relevant features on each channel during training.

Aquila Optimizer (Aqu)
Aquila Optimizer (Aqu) [27] is a new population-based optimizer that is classified as a metaheuristic optimization technique. The mathematical formulation of this optimizer is presented in this section. The social behavior of Aquila inspires the Aqu algorithm during the hunting process of its prey. Like other population-based metaheuristic techniques, Aqu starts with N agents with an X initial population. This initialization process is executed using the following formula.
where LB j and UB j are the lower and upper bounds of the exploration domain. r 1 ∈ [0, 1] is a randomly generated parameter, and Dim is the population size.
Once the population is initialized, the algorithm executes exploitation and exploration processes until the optimal solution is obtained. There are two main implemented strategies during exploitation and exploration processes [27].
The first strategy is implemented to execute the exploration process considering the average agents (X M ) and the best agent X b . This strategy is mathematically formulated as follows: where T is the total number of iterations, while the search process is controlled using In the second strategy, the exploration of the agents is updated based on the Levy flight (Levy(D)) distribution and X b . This strategy is mathematically formulated as follows: where β = 1.5 and s = 0.01, while υ and u are randomly generated parameters. In Equation (6), X R is a randomly selected agent. Moreover, x and y are used to follow the spiral tracking shape, and they are mathematically formulated as follows: where U = 0.00565 and ω = 0.005. r 1 ∈ [0, 20] is a randomly generated parameter.
In [27], the first strategy is utilized to update the agents during the exploitation process based on X M and X b , and it is mathematically formulated as follows: where δ and α denote the adjustment parameters of exploitation process. rand ∈ [0, 1] is a randomly generated parameter.
In the second strategy, the agent is updated during the exploitation process using the quality function QF, and X b , Levy. This strategy is mathematically formulated as follows: Furthermore, G 1 specifies the employed motions during tracking the best solution, and it is given as: rand is a function that generates random values, and G 2 specifies decreased values from 2 to 0, and it is given as:

Proposed Framework
In this section, the general framework of the developed COVID-19 image classification method is described.

MobileNetV3 for Feature Extraction
The fine-tuning process of MobileNetV3 and feature extraction phase are described in this section. The main objective is to extract relevant image embeddings relying on a pretrained model on different COVID-19 image datasets. Meanwhile, the extracted image embedding in this phase is fed into the feature selection phase, discussed in the next section. Compared to previous studies, the feature selection phase employs a new swarm optimization technique to enhance the recognition accuracy, select only essential features, and reduce the features representation space of the overall proposed framework.
As described in Section 2.1, efficient convolutional neural networks such as Mo-bileNetV3 [32] act as suitable models to perform image recognition where they can act as a core component in the feature extraction phase. We used a pretrained model of Mo-bileNetV3 trained on the ImageNet dataset to avoid training the model from scratch and speed up the learning process. More specifically, the MobileNetV3-Large pretrained model was used in our experiments and adapted to the COVID-19 recognition task via transfer learning and fine-tuning. We follow the standard procedure to fine-tune the MobileNetV3 model and extract the relevant image embeddings. First, we change the top two output layers of the MobileNetV3 model used for image classification with a 1 × 1 point-wise convolution to extract images features. The 1 × 1 point-wise convolution can act as a multilayer perceptron (MLP) to perform image classification or dimensionality reduction by integrating different nonlinearity operations. In addition, other 1 × 1 point-wise convolutions have been added on the top of the model for fine-tuning the model's weights on different datasets based on the classification task. Second, after fine-tuning the model, we flatten the output of the 1 × 1 point-wise convolution used for feature extraction to generate image embeddings with the size of 128 for each image in the dataset. Lastly, the extracted image embeddings are fed into the feature selection phase. Figure 2 shows the architecture of the modified MobileNetV3 for COVID-19 images feature extraction. The feature extraction phase was performed after fine-tuning the model for 100 epochs during ten randomly initialized runs where we used the model resulting in the highest classification accuracy on each dataset. A batch of size 32 and a stochastic gradient descent approach named RMSprop were used to fine-tune the model with a learning rate set to 1 × 10 −4 . Data augmentation was employed during the data preprocessing phase to overcome overfitting and improve the model's generalization. The data augmentation transformation such as random crop, random horizontal flip, color jitter, and random vertical flip was used alongside original image resizing to shape 224 × 224.

Developed Aqu FS Algorithm
To apply the Aqu algorithm as an FS method, its binary version is developed as given in Figure 3. The main target of this conversion is to prepare the Aqu algorithm for working with the discrete problem since its original version is implemented to work with real-valued problems only. There are two stages of Aqu as FS technique, and the details of each stage are given in the following sections.

First Stage: Learning of Model
This stage aims to use the training set to learn the model to select the most relevant features, and in this study, we used 70% from the dataset as a training set. The first process in this stage is to set the initial value for the population X, which contains N agents. This process is defined as: X i = rand * (U − L) + L, i = 1, 2, . . . , N, j = 1, 2, . . . , N F (15) where N F represents the number of features, whereas U and L are the limits of search domain.
The next step is to obtain the binary form for each X i , and this was produced using Equation (16).
Then, the fitness value Fit i of each X i is evaluated using the following formula: In Equation (17), |BX i | stands for the number of features (i.e., the ones in BX i ), whereas γ i represents the classification error using the KNN classifier that used the reduced training set based on BX i . In addition, λ is a weight value used to balance between the two objectives in Equation (17) (i.e., minimizing the selecting features and reducing the error of classification). After that, the agent with the best fitness value (Fit b ) is considered the best agent X b . The X b agent is used to update the other agents according to the operators of the Aqu algorithm as discussed in Equations (4)- (14).
The next step is to check if the terminal conditions are met, then X b is returned; otherwise, updating the solutions is conducted again.

Second Stage: Evaluation of the Selected Features
Within this stage, the relevant features in the best solution X b are used to reduce the testing set used as input to the KNN classifier. Later, we compute the performance of the predicted output using various performance measures.

Dataset Description
This section describes the datasets used in the COVID-19 detection task and the distribution of their corresponding samples. The datasets include two types of images: X-ray and CT scan images (computed tomography scan), where Figure 4 shows examples from each dataset. Our experiments used three different datasets to train and fine-tune the feature extraction model, namely the COVID-CT dataset (dataset1), the COVID-XRay-6432 dataset (dataset2), and the COVID-19 radiography dataset (dataset3). We keep the same data split after extracting image embeddings from each dataset which are fed to a feature selection and classification phase. In the following section, a detailed description of each dataset is given.    Table 3 lists the number of samples in each class after splitting the data.

Performance Metrics
To assess the accuracy of the developed model, some statistical parameters were computed, such as the mean of best values, the mean of the worst values (Max), standard deviation, and computational time elapsed during the selection of features. Then, statistical measures were computed during the classification phase. The mathematical form of these measures are given as: Speci f icity = TN TN + FP (20) where "TP" is the abbreviation of true positives and represents the positive COVID-19 images labeled using the proposed classifier correctly. "TN" stands for the true negative samples and represents the negative COVID-19 images that were labeled using the proposed classifier correctly. "FP" is the abbreviation of false positives and represents the positive COVID-19 images labeled using the proposed classifier incorrectly, while "FN" is the abbreviation of false negatives and represents the negative COVID-19 images that were labeled using the proposed classifier incorrectly.
• Best accuracy: where r denotes the run numbers. Fit i represents a fitness function value.
To validate the performance of Aqu as an FS method, its results were compared with other well-known FS methods based on MH techniques. For example, whale optimization algorithm (WOA) [43], moth-flame optimization (MFO) [44,45], firefly algorithm (FFA) [46], bat algorithm (BAT) [47], hunger games search (HGS) [48], transient search optimization (TSO) [49], and Aquila Optimizer (Aqu) [27]. In this paper, the parameters of these FS methods are assigned based on the original implementation of each method. However, the common parameters, such as the number of iterations and population size, are set to 20 and 15, respectively. In addition, each FS method conducted 25 runs for a fair comparison between them. All DL training and feature extraction phases were conducted on a GPU (Graphics processing unit) of type GTX1080 from Nvidia, while the feature selection phase has experimented on the Google collaboratory platform. For a proper validation of the framework, other DL models such as DenseNet, VGG19, and EfficientNet were exploited as backbone feature extraction methods using their standard architecture and parameters.

Results and Discussion
In this subsection, the performance of the developed model is evaluated using two datasets as given in Tables 4 and 5 and Figures 5 and 6. In general, it can be noticed from Table 4 that the developed method can improve the performance of classification accuracy among the two tested datasets. For example, to analyze the performance of the developed Aqu over Dataset1, the following points can be observed: firstly, the accuracy of Aqu is better than other methods, which nearly have a difference between the best second algorithm (i.e., HGS) with 1.2%. Secondly, Aqu has a higher Recall, Precision, and F1 score overall than the comparative FS methods such as HGS and BAT, which allocated the second and third ranks, respectively. Similar to dataset1, the efficiency of Aqu using dataset2 is the best in terms of classification accuracy, followed by BAT and HGS. In addition, the recall value of HGS and MFO is better than all other methods (i.e., WOA, FFA, BAT, and TSO) except for the developed Aqu method, whereas the precision value of WOA and TSO allocate the second rank after the Aqu method, which allocates the first rank. In order to analyze the results of the F1-score obtained using the second dataset, it can be observed that the HGS is better than other algorithms, which allocate the second rank after the Aqu algorithm.    Figure 5 depicts the average of each algorithm in terms of accuracy, recall, precision, and F1-score. It can be seen from this figure that the average of the Aqu algorithm over the two datasets is better than other methods in terms of performance measures.
Moreover, the time computational of the FS methods is computed to justify their time complexity as given in Table 5. We can notice from the CPU time(s) values that the developed Aqu algorithm has the shortest time in dataset1. However, the CPU time(s) of the Aqu algorithm over datasets is the second-best one that followed the TSO algorithm. Meanwhile, the efficiency of Aqu to reduce the number of features is observed from the number of selected features (i.e., the #FS column). Aqu has the smallest number of features, 130 and 140 at Dataset1 and Dataset2, respectively. In addition, from Figure 6 which shows the average over the two datasets in terms of CPU time(s) and #FS, the high superiority of Aqu over other methods can be seen.

Comparison with Other CNN Types
In this section, the performance of the developed method that combines the Mo-bileNetV3 and Aqu is compared with the other three CNN types networks. These network include VGG19 [50] (Visual Geometry Group), DenseNet [51] , and EfficientNet [33]. The main aim of this comparison is to assess the ability of MobileNetV3 to extract the relevant features.
The comparison results between the MobileNetV3 and other CNN types are given in Table 6. From these results, it can be seen that the MobileNetV3 can provide better performance than other CNNs followed by DenseNet that has a high ability to extract relevant features better than the other two networks (i.e., VGG and EfficientNet). The same observation can be noticed in Figure 7, which depicts the average of the accuracy in all the tested FS methods using the feature extracted from each CNN type. In addition, the performance of Aqu based on MobileNetV3 in terms of accuracy among the two tested datasets is given in Figure 8. From these averages, it can be noticed that the developed method provides better results than others. In addition, the ability of Aqu to increase the accuracy classification is better than other FS methods when using different CNN types.

Influence of the Size of COVID19 Dataset
In this section, the influence of using a large number of images on the performance of the developed method is assessed using a third dataset (i.e., COVID-19 radiography) described in Section 4.1. Table 7 shows the average of the results in terms of performance measures for each FS algorithm that depends on the features extracted using MobileNetV3. From these results, one can reach the following observations: firstly, Aqu has a high ability to enhance the classification accuracy; in addition, it can reduce the number of features required to increase the classification accuracy. However, the Aqu allocates the second rank after TSO in CPU time (s) required to determine the relevant features.

Conclusions
This study developed a framework to detect the COVID-19 cases from X-ray and CT images using three datasets with a considerable amount of samples. The proposed framework depends on the combination of the MobileNetV3 DL model and metaheuristic (MH) techniques. Furthermore, three other DL networks were included in our experiments, namely VGG19, DenseNet, and EfficientNet. For instance, MobileNetV3 was used to extract the features from all existing images in each dataset. By contrast, a new MH technique named Aquila Optimizer (Aqu) was proposed for feature selection (FS) by converting it to binary. The extracted image embeddings from each DL network were fed to the FS algorithm for feature space reduction and classification performance improvement. To justify the performance of the developed method, three datasets are used with different characteristics since they represent X-ray and CT COVID-19 images collected from different sources. The comparison results illustrated the high performance of the developed method based on the Aqu method over the other competitive methods.
Besides the promising results, the developed method can be extended to other applications such as agriculture, remote sensing, galaxy classification, and other image classification tasks.
Author Contributions: M.A.E.: Conceptualization, supervision, methodology, formal analysis, resources, data curation, and writing-original draft preparation. A.D.: Conceptualization, supervision, methodology, formal analysis, resources, data curation, and writing-original draft preparation. A.H.E.: Conceptualization, supervision, methodology, formal analysis, resources, data curation, and writing-original draft preparation. A.I.S.: Supervision and writing-review and editing, methodology, formal analysis, resources, and data curation. M.A.: supervision and writing-review and editing, methodology, formal analysis, resources, and data curation. N.A.A.: supervision and writing-review and editing, methodology, formal analysis, resources, and data curation. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.